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Amendments to the Claims: 

This listing of claims will replace all prior versions and listings of claims in the application: 
Listing of Claims: 

1 , (Currently amended) In a data processing system, a method for creating a 
database from information found on a plurality of web pages using a first classifier and a s e cond 
classifi e r , said information comprising global regularities and local regularities, said global 
regularities being patterns that are expected to be found in all said web pages, and said local 
regularities being patterns which are not expected to be found in all said web pages, said method 
comprising: 

a) d e fining first r e gulariti e s and s e cond r e gulariti e s, said first r e gulariti e s b e ing 
patt e ms which ar e e xp e ct e d to b e found in information in said w e b pag e s, and said s e cond 
r e gulariti e s b e ing patt e rns which ar e not e xp e ct e d to b e found in all said w e b pag e s; 

b) initially providing d e scriptions of said first r e gulariti e s to a working databas e ; 

th e r e aft e r 

e a) training providing a fest global classifier in s aid working databas e using said 
fifst global regularities; 

d b) identifying a candidate subset of the web pages expected to have said se cond 
local regularities; thereafter 

e c) tentatively identifying and tagging, in said candidate subset of the web pages, 
elements having said fifst global regularities, by using said fifst global classifier to obtain first 
tentative labels; 

f d) training a s e cond local classifier using said first tentative labels , said local 
classifier using said local regularities for its classification : 

g e) tentatively identifying elements having specific combinations of said fifst 
global regularities and said s e cond local regularities using said fkst global classifier and said 
s e cond local classifier to obtain second tentative labels for said elements of said candidate 
subset; and thereafter 
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h f) outputting^ said second tentative labels as permanent labels associated with 
said elements of said candidate subset of web pages. 

2. (Currently amended) The method of claim 1 further including: 

h g) deciding whether to retrain said s e cond local classifier with said second 
tentative labels. 

3. (Currently amended) The method according to claim 2, further including: 
f h) training the s e cond local classifier using said second tentative labels. 

4. (Currently amended) The method according to claim 2, further including 
g h) collecting said permanent labels associated with said elements of said 

candidate subset of web pages; 

fe i) r e training training said fifst global classifier in response to said permanent 

labels. 

5. (Currently amended) The method according to claim 1 wherein said 
second classifier treats selected first global regularities differently than said fifst global classifier 
treats said first global regularities such that said s e cond local regularities contradict said first 
global regularities . 

6. (Currently amended) The method according to claim 5 wherein said 
outputting step further includes ignoring training results of said first global classifier. 

7. (Currently amended) The method according to claim 5 wherein said 
outputting step further includes combining training results of said first global classifier and said 
s e cond local classifier. 

8. (Currently amended) In a data processing system, a method for learning 
and combining global regularities and local regularities for information extraction and 
classification, said global regularities being patterns which may be found over an entire dataset 
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and said local regularities being patterns found in less than the entire dataset. said method 
comprising the steps of 

a) initially providing d e scriptions of said global r e gulariti e s to a working 
databas e , said global r e gulariti e s b e ing patt e rns which may b e found ov e r an e ntire datas e t; 
th e r e aft e r 

b a) identifying a candidate subset of the dataset in which said local regularities 
may be found; thereafter 

e b) tentatively identifying elements having said global regularities in said a 
candidate subset to obtain first tentative labels, said first tentative labels being usefiil for tagging 
information having identifiable similarities; thereafter 

d c) attaching said first tentative labels onto said identified elements of said 
candidate subset; thereafter 

e d) employing said attached first tentative labels via one of a class of inductive 
operations to formulate first local regularities; thereafter 

f e) tentatively identifying elements having specific combinations of said global 
regularities and said first local regularities to obtain attached second tentative labels; thereafter 

g D t e sting if e stimat e d e rror rat e is within a pr e s e l e ct e d tol e ranc e or if a st e ady 
s tat e in said attach e d s e cond t e ntativ e lab e ls is e vid e nt; and if tru e , th e n rating confidence of said 
attached second tentative labels and converting selected ones of said attached second tentative 
labels to confidence labels upon achieving a preselected confidence level; and then 

outputting data with said confidence labels; otherwise 

h g) employing said second tentative labels via said operation on said candidate 
subset to formulate second local regularities, and 

i h) repeating from step f e) until said confidence labels have been fullv 

developed. 

9. (Original) The method according to claim 8 wherein said initial global 
regularity providing step comprises manually inputting descriptions of said global regularities. 



Page 5 of 16 



Appln. No. 09/77 1 ,008 ' PATENT 

Amdt. dated June 4, 2004 

Reply to Office Action of December 4, 2003 

10. (Currently amended) The method according to claim 8 wherein said 
initial global regularity providing step comprises obtaining said global regularities from a further 
one of said class of said inductive operations that has been applied to a subset of said dataset, 
said subset of said dataset having been manually labeled. 

1 1 . (Currently amended) The method according to claim 10 further including 
developing refined global regularities comprising the steps of: 

1 i)_collecting confidence labels fi:'om at least one of said candidate subsets to 
obtain global confidence labels; 

m j) employing said global confidence labels on candidate subsets along with said 
manually labeled dataset via one of said class of inductive operations to formulate said refined 
global regularities; 

ft k) providing descriptions of said refined global regularities to said working 
database; thereafter 

e 1) identifying a next candidate subset of the dataset in which local regularities 
may be found; thereafter 

p m) tentatively identifying elements having said refined global regularities in 
said candidate subset to obtain next tentative labels; thereafter 

q n) attaching said next tentative labels onto said identified elements of said next 
candidate subset; thereafter 

F o) employing said attached next tentative labels via one of the class of inductive 
operations to formulate next local regularities; thereafter 

s p) tentatively identifying elements having specific combinations of said refined 
global regularities and said next local regularities to obtain attached next second tentative labels; 
thereafter 

t g) t e sting if estimat e d error rat e is within a pr e s e l e ct e d tol e ranc e or if a st e ady 
state in said n e xt s e cond t e ntative lab e ls is e vid e nt; and if tru e , th e n rating confidence of said 
attached next second tentative labels and converting selected ones of said attached next second 
tentative labels to confidence labels upon achieving a preselected confidence level; and then 
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r)_outputting data with said confidence labels; «)-otherwise 
s) employing said next second tentative labels via said operation on said candidate 
subset to formulate next second local regularities, and 
V t) repeating from step s o). 

12. (Original) The method according to claim .1 1 further including the 

steps of: 

applying said data with confidence labels to further subsets of said dataset to 
investigate further subsets for local regularities. 

13. (Currently amended) The method according to claim 8 further including 
developing refined global regularities comprising the steps of: 

1 i) collecting confidence labels from at least one of said candidate subsets to 
obtain global confidence labels; 

ffi j) employing said global confidence labels on candidate subsets via one of said 
class of inductive operations to formulate said refined global regularities; 

n k) providing descriptions of said refined global regularities to said working 
database; thereafter 

e 1) identifying a next candidate subset of the dataset in which local regularities 
may be found; thereafter 

• p m) tentatively identifying elements having said refined global regularities in 
said candidate subset to obtain next tentative labels; thereafter 

q n) attaching said next tentative labels onto said identified elements of said next 
candidate subset; thereafter 

F o) employing said attached next tentative labels via one of the class of inductive 
operations to formulate next local regularities; thereafter 

s g) tentatively identifying elements having specific combinations of said refined 
global regularities and said next local regularities to obtain attached next second tentative labels; 
thereafter 
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t C[) t e sting if e stimated e rror rat e is within a pr e s e l e ct e d toleranc e or if a st e ady 
stat e in said n e xt s e cond t e ntative labels is evident; and if tru e , th e n rating confidence of said 
attached next second tentative labels and converting selected ones of said attached next second 
tentative labels to confidence labels upon achieving a preselected confidence level; and then 

r) outputting data with said confidence labels; it) otherwise 

s} employing said next second tentative labels via said operation on said candidate 
subset to formulate next second local regularities; and 

v t) repeating from step s g)- 

14. (Original) The method according to claim 13 further including the 

steps of: 

applying said data with confidence levels to further subsets of said dataset to 
investigate further subsets for local regularities. 

15. (Currently amended) In a data processing system, a method for leaming 
and combining regularities of a first level and regularities of at least a second level and a third 
level for information extraction and classification, said first, second and third levels having a 
hierarchv from most global to most specific , said method comprising the steps of: 

a) d e t e rmining a hi e rarchy of l e v e ls from most global l e v e l to most sp e cific l e v e l; 

h a) beginning at the most global level, training a classifier at the selected level by 
initially providing descriptions of regularities at said selected level to a working database, said 
selected level regularities being pattems which may b e are to be found over a selected portion of 
a selected dataset corresponding to the selected level : thereafter 

e b) identifying a candidate subset of each the selected dataset in which next more 
specific regularities may be found; thereafter 

d c) tentatively identifying elements having said selected regularities in said 
candidate subset to obtain first tentative labels, said first tentative labels being useful for tagging 
like mutuallv similar information; thereafter 

e d) attaching said first tentative labels onto said identified elements of said 
candidate subset; thereafter 



Page 8 of 16 



Appln. No. 09/771,008 PATENT 

Amdt. dated June 4, 2004 

Reply to Office Action of December 4, 2003 

f e) employing said attached first tentative labels via one of a class of inductive 
operations to formulate first local regularities; thereafter 

g f) tentatively identifying elements having specific combinations of said global 
regularities and said local regularities to obtain attached second tentative labels; thereafter 

hg) t e sting if e stimat e d e rror rat e is within a pres e l e ct e d tol e ranc e or if a st e ady 
stat e in said attach e d second t e ntativ e lab e ls is e vid e nt; and if tru e , th e n rating confidence of said 
attached second tentative labels and converting selected ones of said attached second tentative 
labels to confidence labels upon achieving a preselected confidence level; and then 

h} outputting data with said confidence labels; i otherwise 

j) employing said second tentative labels via said operation on said candidate 
subset to formulate second more specific regularities; 

j k) repeating from step g f); and 

k 1) repeating from Step b a) for each successive more selective level of 

regularity. 
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